Low power cache architecture

ABSTRACT

In a processor cache, cache circuits are mapped into one or more logical modules. Each module may be powered down independently of other modules in response to microinstructions processed by the cache. Power control may be applied on a microinstruction-by-microinstruction basis. Because the microinstructions determine which modules are used, power savings may be achieved by powering down those modules that are not used. A cache layout organization may be modified to distribute a limited number of ways across addressable cache banks. By associating fewer than a total number of ways to a bank (for example, one or two ways), the size of memory clusters within the bank may be reduced. The reduction in this size of the memory cluster contributes reduces the power needed for an address decoder to address sets within the bank.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a divisional application that claims the benefit ofU.S. patent application Ser. No. 09/749,750 (filed Dec. 28, 2000; nowU.S. Pat. No. 6,845,432) which is incorporated herein in its entirety.

BACKGROUND

The present invention relates to a cache architecture that contributesto reduced power consumption of integrated circuits. More particularly,the cache architecture permits units within the cache to be disabled ona microinstruction-by-microinstruction basis.

Issues of power consumption have become increasingly important for thedesign of integrated circuits. The power consumption of integratedcircuits, particularly that of processors, has increased over the yearswith the historical increase clock speeds. Modern processors now consumeso much power that the heat generated by the processors has becomedestructive. The increase in power consumption also contributes toreduced battery life in mobile computing applications.

Power management techniques are commonplace in the modern computer.Users of domestic personal computers recognize that computer monitors,disk drives and the like are disabled when not in use. However, suchtechniques are not able to keep pace with the ever increasing powerdemands made by newer generations of integrated circuits. Accordingly,there remains a need in the art for an integrated circuit architecturethat contributes to reduced power consumption of the integrated circuit.

An internal cache may be perhaps the largest functional unit within aprocessor. In the Pentium Pro® processor, commercially available fromIntel Corporation, an L2 cache may have a capacity to store 2 MB of dataand may occupy approximately 60% of the processor's area whenmanufactured as an integrated circuit. If a power control techniquecould be applied to a processor's cache, it could achieve considerablepower control savings for the chip overall.

It is known to disable (e.g. power down) a processor cache when it isnot in use in order to save power. Such a technique, however, disablesthe cache entirely and can be used only when the cache has no operationto perform. As is known, a cache may have a high utilization rate.Accordingly, the known power conservation techniques for caches do notachieve significant reductions in power consumption.

No known power control scheme for a processor cache permits the cache toconserve power when in operation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a processor cache according to anembodiment of the present invention.

FIG. 2 is a state diagram illustrating operation of the cache manager inaccordance with an embodiment of the present invention.

FIG. 3 is a state diagram illustrating operation of the cache manager inaccordance with an embodiment of the present invention.

FIG. 4 is a state diagram illustrating response to a read request inaccordance with an embodiment of the present invention.

FIG. 5 is a block diagram illustrating cache organization according toconventional techniques.

FIG. 6 is a block diagram illustrating cache organization according toan embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention provide a processor cache in whichcache circuits are mapped into one or more logical modules. Each modulemay be powered down independently of other modules. Caches typicallyoperate according to microinstructions. For many microinstructions,certain modules will not participate in the cache operations. Powersavings may be achieved by powering down modules on amicroinstruction-by-microinstruction basis.

FIG. 1 is a block diagram of a processor cache according to anembodiment of the present invention. According to the embodiment, thecache 100 may be populated into a plurality of cache entries 110. Theembodiment of FIG. 1 illustrates the cache entries 110 organized intosets and ways. As shown in FIG. 1, way 0 may include a predeterminednumber N of cache entries. Each cache entry 110 may be organized into atag field 120, a data field 130 and a stage field (labeled “S”). Data tobe stored in the cache 100 may be stored in the data fields 130 of thecache entries 110. Partial address information identifying a source ofthe data stored in a data field 130 may be stored in the tag field 120.Cache coherency state information may be stored in the state field S.For simplicity's sake, the architecture of a single way (way 0) isillustrated in FIG. 1; the architecture of way 0 may extend to the otherways 1, 2, etc. In practice a cache may include fewer or more ways thanthe four ways shown in FIG. 1.

The cache 100 may include an address decoder 140. The address decoder140 may have an input for a portion of address data (labeledADDR_(set)). The address decoder 140 may have a plurality of outputs,one connected to each of the N cache entries 110 in each of the ways.The decoder 140, therefore, may define “sets” within the cache 110(referenced with respect to output lines from the decoder 140). Thedecoder output labeled “set0,” for example, may be connected to onecache entry 110 from each of the ways in the cache 100. Similarly, allother output lines from the decoder 140 may be connected to one cacheentry 110 from each ways. Some processors may provide separate decodersfor the tag fields 120 and cache fields 130 of a single cache entry 110(embodiment not shown). In response to an address signal ADDR_(set), thedecoder 140 may enable W cache entries 110, where W represents thenumber of ways in the cache 100.

Each way may include a comparator 150 having a first input coupled tothe tag fields 120 of the cache entries 110 within the respective way.The comparator 150 also may have a second input coupled to a secondportion of an address input, labeled ADDR_(tag). When in use, stored taginformation from one of the cache entries 110 will be output to thecomparator 150 in response to an output from the decoder 140. Thecomparator 150 may generate an output which is enabled when the storedtag data matches the ADDR_(tag) input. The output from a comparator 150may control a transmission gate 135 on the output from data fields 130to permit data read from the data fields 130 to propagate from the wayin response to a tag match.

The cache 100 also may include a victim allocation unit 160, populatedby a plurality of eviction control registers (LRUs) 170. The victimallocation unit 160 may have an LRU 170 for each set in the cache 100.When new data is to be written to a particular set (say, set0) in thecache 100 and all sets contain valid data, data must be evicted from oneof the ways before the new data may be stored. For each set, the LRUs170 identify one of the ways (called, a “victim way”) as the next datato be evicted. Typically, eviction occurs according to aleast-recently-used, round-robin or other conventional basis.

A cache 100 may include a cache manager 180, which, typically, is astate machine. The cache manager 180 may receive data requests fromother processor components (not shown) relating to the reading orwriting of data to/from the cache 100. The cache manager 180 may operatein accordance with several micro-operations (commonly, “UOPs”) toaccomplish this function. Known UOPs include:

-   -   Evict, causing data to be evicted from a cache entry 110;    -   Read Line, LRU Update (“RLU”), causing data to be read from an        addressed cache entry and updating the LRU register 170 based on        new use;    -   Tag Inquiry (“TI”), reading tag data 120 from addressed cache        entries 110 for comparison with input ADDR_(tag) data;    -   Tag Read/Data Read (“TRR”), reading the tag field 120 and data        field 130 of addressed cache entries 110;    -   Tag Write (“TW”), writing data to the tag field 120;    -   Tag Write/Data Read (“TWR”), reading data 130 from a cache entry        110 and writing data to the tag field 120 thereof; and    -   Tag Write/Data Write (“TWW”), writing data to the tag field 120        and data field 130 of an addressed cache entry 110.        The cache manager 180 typically manages the operation of the        cache 100 to fulfill the data requests via control paths to each        field 120, 130, S of each cache entry 110 (not shown in FIG. 1        for simplicity).

The cache 100 operates in synchronism with a system clock. Individualcomponents within the cache (the cache entries 100, decoder 140,comparator 150, LRUs 170) operate in accordance with clock signalsderived from the system clock. A clock distribution network 200 isprovided within the processor to distribute the clock signals throughoutthe cache. In this regard, the operation and structure of aset-associative cache is well-known.

According to an embodiment, the cache manager 180 may interact with theclock distribution network 200 to selectively enable or disable logicalmodules in response to UOPs. For example, the clock distribution network200 may be coupled to each tag field 120, data field 130 and statefields S of each of the ways by a respective clock line C. In anembodiment, transmission gates G may be provided on the clock lines C.The transmission gates G may be controlled by the cache manager 180 toselectively enable or disable the clock lines. Clock lines to cacheentries 110 in a first way (say, way 0) may be controlled independentlyof clock lines to cache entries 110 in a second way (way 1). A clockline C of the victim allocation unit 160 similarly may be provided witha transmission gate G under control of the cache manager 180.

Table 1 below identifies the settings of each module for each UOP:

TABLE 1 Tag Data State LRU Number Microinstructions (UOPs) 120 130 S 170Of Ways Evict On On On Off One Read Line, LRU Update On On On On All(“RLU”) Tag Inquiry (“TI”) On Off On Off All Tag Read/Data Read (“TRR”)On On On Off One Tag Write (“TW”) On Off On Off One Tag Write/Data Read(“TWR”) On On On Off One Tag Write/Data Write (“TWW”) On On On Off OneBy disabling cache components such as ways, tag fields 120, data fields130, state fields S and LRUs 170 on a UOP-by-UOP basis, the cachearchitecture 100 achieves considerable power conservation over knownalternatives.

FIG. 2 is a state diagram illustrating operation of the cache manager180 (FIG. 1) in accordance with an embodiment. The cache manager 180 maybegin operation in an idle state 300 and transition out of the idlestate 300 in response to data request.

When the data request is a read operation, the cache manager 180 maygenerate an RLU UOP 310. The RLU UOP 310 may energize the tag fields120, data fields 130 and state fields S of all ways in the cache 100(FIG. 1) as well as the LRU 170 in the set identified by the ADDR_(set)so that data may be read from them. As explained, tag data 120 is readfrom each way and compared with the ADDR_(tag) information. A cachehit/miss decision may be made from the comparison. A cache “hit” may besignaled when one of the comparators 150 indicates a match and statedata S indicates the presence of valid data in the data field 130; itmeans that the requested data is located within the cache 100.Otherwise, the request is said to “miss” the cache.

The LRU 170 identifies the next way that will be selected as a victim.If a cache miss occurs, new data eventually may be stored in the cache100. The cache manager 180 may determine from the state information ofthe victim way whether the victim is dirty or not. “Dirty” data is datathat has been modified; it may be the most current copy of the data inthe system. Dirty data is protected against destruction. If the victimdata is dirty, the cache manager 180 may generate an Evict UOP 320 toevict the dirty data from the cache 100 without destroying it. Duringprocessing of the Evict UOP 320, only a victim way (say, way 0) may beenabled. All other ways and the victim allocation unit 160 may bepowered down. Thereafter, or if the victim data was not dirty, the cachemanager 180 may return to the idle state 300.

When the data request is a data replacement instruction, the cachemanager 180 may issue a TWW UOP 330. The TWW UOP 330 may cause the tagfield 120, the data field 130 and the state field S of a single way,called the “target way,” to be energized so that new data can be writtento it. All other ways and the victim allocation unit may be powered downduring the TWW UOP 330. The cache manager 180 may identify the targetway from the data replacement instruction itself. Thereafter, the cachemanager 180 may return to the idle state 300.

When the data request is a write instruction issued pursuant to a dataeviction from a higher-level cache (not shown), the cache manager 180may issue a TI UOP 340. The TI UOP 340 may energize the tag fields 120and state fields S of all ways in the cache 100. The data fields 130throughout the cache 100 and the victim allocation unit 160 may bepowered down during the TI UOP 340. The cache manager 180 may determinea hit or miss based upon outputs from the way comparators 150 and thestate fields. If a hit occurs, the cache manager 180 may issue a TWW UOP330, causing new data to be written to the tag field 120 and data field130 of the cache entry 110 that cause the hit. All other ways and thevictim allocation unit may be powered down during the TWW UOP 330.Thereafter, the cache manager 180 may return to the idle state 300.

When the data request is a cache writeback invalidate instruction, thecache manager 180 may issue a TRR UOP 350. The TRR UOP 350 may causedata from the tag field 120, the data field 130 and the state field S ofan addressed cache entry 110 in a victim way to be energized. All otherways may be powered down during processing of the TRR UOP 350. The cachemanager 180 may identify the victim way from the data request itself.From the state field S, the cache manager 180 may determine if thevictim cache entry 110 is valid or non-valid. If the victim cache entry110 is valid, the cache manager 180 may generate a TW UOP 360 causingnew state data, designating an invalid state, and possibly taginformation, to be written to the cache entry 110. All other ways andthe victim allocation unit 160 may be powered down during processing ofthe TW UOP 360. Thereafter, or if the victim cache entry 110 wasinvalid, the cache manager 180 may return to the idle state 300.

If the cache 100 (FIG. 1) were positioned within a cache hierarchy aboveother higher-level cache, a data request that indicates a write of datapursuant to a Read For Ownership (“RFO”) request need not cause data tobe written in the cache 100. Typically, the data is being written toother higher-level caches (not shown) so the data can be modified.Eventually, the data may be written to the cache 100 pursuant to aneviction from the higher-level cache. Accordingly, in an embodiment,when the data request is a write of data pursuant to an RFO, the cachemanager 180 may generate a TW UOP 360, causing a write of tag and stateinformation in the tag field 120 and state field S of a cache entry 110in a single way. All other ways and the victim allocation unit 160 maybe powered down during processing of the TW UOP 360. Thereafter, thecache manager 180 may return to the idle state 300.

If the data request is a cache invalidate command, the cache manager 180also may generate a TW UOP 360. The TW UOP 360 may cause a write of tagand state information in the tag field 120 and state field S of a cacheentry 110 in a single way. The cache manager 180 may determine thetarget way from the cache invalidate command itself. All other ways andthe victim allocation unit 160 may be powered down during processing ofthe TW UOP 360. Thereafter, the cache manager 180 may return to the idlestate 300.

If the data request is a snoop probe, the cache manager 180 may generatea TI UOP 370. The TI UOP 370 may cause data to be read from the tagfields 120 and state fields S from addressed cache entries 110 in allways of the cache 100. The data fields 130 throughout the cache 100 andthe victim allocation unit 160 may be powered down during the TI UOP370. Following the TI UOP 370, if a way comparator 150 indicates a tagmatch, the cache manager may generate a TW UOP 380 if the stateinformation from the matching cache entry 110 indicate that the datawere held in exclusive state or if the snoop probe were not aGo-to-Shared snoop. The TW UOP 380 may update the data held in the statefield S of the matching way. All other ways and the victim allocationunit 160 may be powered down during processing of the TW UOP 380.Thereafter, the cache manager 180 may return to the idle state 300.Following the TI UOP 370, if the state information from the matchingcache entry 110 indicates that the data is invalid or modified, thecache manager 180 may return to the idle state 300.

If the data request is a snoop confirm command and no data is to betransferred via an implicit writeback, the cache manager 180 maygenerate a TW UOP 380. The TW UOP 380 may cause data to be written tothe state field S from an addressed cache entry 110 in a single way ofthe cache 100. All other ways and the victim allocation unit 160 may bepowered down during processing of the TW UOP 380. Thereafter, the cachemanager 180 may return to the idle state 300.

If the data request is a snoop confirm command and data is to betransferred via an implicit writeback, the cache manager 180 maygenerate a TWR UOP 390. The TWR UOP 390 may cause data to be read from adata field 130 and new state data to be written to the state field S ina single way of the cache 100. All other ways and the victim allocationunit 160 may be powered down during processing of the TWR UOP 390.Thereafter, the cache manager 180 may return to the idle state 300.

The foregoing description illustrates operation of a cache 100 thatintegrates power control techniques of the present embodiments with UOPsof known caches. It is believed that the foregoing embodiments achievegreater power conservation in a processor system than was available byother known power control techniques.

Additional power conservation may be achieved by the cache architecture100 by enhancing the UOPs employed by such a cache. Accordingly, thefollowing description presents embodiments of the present invention thatredefine cache UOPs and their uses to improve the power controltechniques even further.

According to an embodiment, the cache manager 180 may interact with theclock distribution network 200 to selectively enable or disable logicalmodules in response to UOPs. Table 2 below identifies the settings ofeach module for each UOP:

TABLE 2 Tag Data State LRU Number Microinstructions (UOPs) 120 130 S 170Of Ways Data Write (“DW”) Off On On Off One Evict On On On Off One ReadLine, LRU Update On On On On All (“RLU”) Tag Inquiry (“TI”) On Off OnOff All Tag Read/Data Read (“TRR”) On On On Off One Tag Write (“TW”) OnOff On Off One Tag Invalidate (“TWI”) Off Off On Off One Tag Write/DataRead (“TWR”) On On On Off One Tag Write/Data Write (“TWW”) On On On OffOne

FIG. 3 is a state diagram illustrating use of the UOPs, according to anembodiment of the present invention. According to this embodiment, thecache manager 180 may begin operation in an idle state 400 andtransition out of the idle state 400 in response to a data request.

When the data request is a read operation, the cache manager 180 maygenerate an RLU UOP 410. The RLU UOP 410 may energize the tag fields120, data fields 130 and state fields S of all ways in the cache 100(FIG. 1) as well as the LRU 170 in the set identified by the ADDR_(set)so that data may be read from them. A cache hit/miss decision may bemade from tag comparisons in each of the ways. If a cache miss occurs,the cache manager 180 may determine from the state information of avictim way (identified by the LRU 170). If the victim way is dirty, thecache manager 180 may generate an Evict UOP 420 to evict the dirty datafrom the cache 100. The Evict UOP 420 may cause only one of the ways tobe energized; all other ways and the victim allocation unit 160 may bepowered down. Thereafter, or if the victim data was not dirty, the cachemanager 180 may return to the idle state 400.

When the data request is a data replacement instruction, the cachemanager 180 may issue a TWW UOP 430. The TWW UOP 430 may cause the tagfield 120, the data field 130 and the state field S of a target way tobe energized so that new data can be written to it. All other ways andthe victim allocation unit may be powered down during the TWW UOP 330.The cache manager 180 may identify the target way from the datareplacement instruction itself. Thereafter, the cache manager 180 mayreturn to the idle state 400.

When the data request is a write instruction issued pursuant to a dataeviction from a higher-level cache (not shown), the cache manager 180may issue a TI UOP 440. The TI UOP 440 may energize the tag fields 120and state fields S of all ways in the cache 100. The data fields 130throughout the cache 100 and the victim allocation unit 160 may bepowered down during the TI UOP 340. The cache manager 180 may determinea hit or miss based upon outputs from the way comparators 150 and thestate fields. If a hit occurs, the cache manager 180 may issue a DW UOP430, causing new data to be written to the data field 130 and statefield S of the cache entry 110 that caused the hit. The tag field 120need not be energized. Further, all other ways and the victim allocationunit 160 may be powered down. Thereafter, the cache manager 180 mayreturn to the idle state 400.

When the data request is a cache writeback invalidate instruction, thecache manager 180 may issue a TRR UOP 450. The TRR UOP 450 may causedata from the tag field 120, the data field 130 and the state field S ofan addressed cache entry 110 in a victim way to be energized. All otherways may be powered down during processing of the TRR UOP 450. The cachemanager 180 may identify a victim way from the data request itself. Fromthe state field S, the cache manager 180 may determine if the victimcache entry 110 is valid or non-valid. If the victim cache entry 110 isvalid, the cache manager 180 may generate a TWI UOP 460 causing newstate information, designating an invalid state, to be written to thecache entry 110. The tag field 120 and the data field 130 in the victimway need not be energized. All other ways and the victim allocation unit160 may be powered down during the TWI UOP 460. Thereafter, or if thevictim cache entry 110 was invalid, the cache manager 180 may return tothe idle state 400.

If the data request is a cache invalidate command, the cache manager 180also may generate a TWI UOP 460. The TWI UOP 460 may cause new stateinformation, designating an invalid state, to be written to the cacheentry 110 of a single way. The cache manager 180 may determine thetarget way from the cache invalidate command itself. All other ways andthe victim allocation unit 160 may be powered down during processing ofthe TWI UOP 460. Thereafter, the cache manager 180 may return to theidle state 400.

If the data request is a snoop probe, the cache manager 180 may generatea TI UOP 470. The TI UOP 470 may cause data to be read from the tagfields 120 and state fields S from addressed cache entries 110 in allways of the cache 100. The data fields 130 throughout the cache 100 andthe victim allocation unit 160 may be powered down during the TI UOP370. Following the TI UOP 470, if a way comparator 150 indicates a tagmatch, the cache manager may generate a TWI UOP 480 if the stateinformation from the matching cache entry 110 indicates that the datawere held in exclusive state or if the snoop probe were not aGo-to-Shared snoop. The TWI UOP 480 may update the data held in thestate field S of the matching cache entry 110. The tag field 120 and thedata field 130 in the matching way may be powered down. Further, allother ways and the victim allocation unit 160 may be powered down.Thereafter, the cache manager 180 may return to the idle state 400.Following the TI UOP 470, if the state information from the matchingcache entry 110 indicates that the data is invalid or modified, thecache manager 180 may return to the idle state 400.

If the data request is a snoop confirm command and no data is to betransferred via an implicit writeback, the cache manager 180 maygenerate a TWI UOP 480. The TWI UOP 480 may cause data to be written tothe state field S from an addressed cache entry 110 in a single way ofthe cache 100. The tag field 120 and the data field 130 in the matchingway may be powered down. Further, all other ways and the victimallocation unit 160 may be powered down. Thereafter, the cache manager180 may return to the idle state 400.

If the data request is a snoop confirm command and data is to betransferred via an implicit writeback, the cache manager 180 maygenerate a TWR UOP 490. The TWR UOP 490 may cause data to be read from adata field 130 and new state data to be written to the state field S ina single way of the cache 100. All other ways and the victim allocationunit 160 may be powered down during processing of the TWR UOP 490.Thereafter, the cache manager 180 may return to the idle state 400.

When the data request is a write of data pursuant to an RFO, the cachemanager 180 may generate a TW UOP 500, causing a write of tag and stateinformation in the tag field 120 and state field S of a cache entry 110in a single way. All other ways and the victim allocation unit 160 maybe powered down during processing of the TW UOP 500. Thereafter, thecache manager 180 may return to the idle state 300.

In the foregoing embodiments, the cache manager 180 may generate an RLUUOP in response to a read request. The RLU UOP, as described, may causecache entries 110 from all ways to be energized in their entirety—thetag field 120, the data field 130 and the state field S—in addition tothe LRU 170 in the victim allocation unit 160. This permits a parallelread of all data fields 130 in the cache. If a comparator 150 and statedata S indicates a tag match, the contents of the data field 130 of thematching way may be used without delay. Although the parallel read ofdata from all ways provides a very fast way of accessing requested data,it consumes unnecessary power.

FIG. 4 is a state diagram illustrating response to a read request inaccordance with an embodiment of the present invention. In thisembodiment, a cache manager may advance from an idle state 600 and issuea TI UOP 610 in response to the read request. As noted, a TI UOP 610 maycause data to be read from the tag fields 120 and state fields S of allways in the cache 100. The data fields 130 may be powered down duringthe TI UOP 610. From the tag and state data, a determination may be madewhether the request hit the cache.

If the request hit the cache 100, the cache manager may generate a dataread UOP (“DR”) 620 causing data to be read from a data field 130 in theone way that generate the tag match. During the DR UOP 620, the tagfield 120 and state field S of the matching way may be powered down.Also, all other ways may be powered down. The victim allocation unit 160may remain powered to be updated. Thereafter, the cache manager 180 mayreturn to the idle state.

If the data request missed the cache, the cache manager 180 may generatean “LRU Read” UOP 630 causing data to be read from the victim allocationunit 160. All ways may be powered down during the LRU Read UOP 630. Thevictim allocation unit 160 may identify a victim way. From the stateinformation read in the TI UOP 610, the cache manager 180 may determinewhether the victim data is dirty or not dirty. If the victim data isdirty, the cache manager may generate an Evict UOP 640. The Evict UOP640 may occur as in prior embodiments. Thereafter, or if the victim datawas not dirty, the cache manager 180 may return to the idle state.

The embodiment of FIG. 4 benefits from reduced power consumption forread requests than in prior embodiments but at the cost of a slightlyincreased latency for read request. In the prior embodiments operatingin accordance with an RLU UOP, reading data fields 130 from cacheentries 110 that did not have a tag match wastes power. By contrast, theembodiment of FIG. 4 reads data from a data field 130 only after a tagmatch identifies a way that may store the requested data. Data is notread from the other ways.

Block diagrams of caches, such as the diagram of FIG. 1, are useful tounderstand the logical structure and operation of a cache. While suchdiagrams accurately describe the logical structure of a cache, they donot describe a cache at the circuit level. As is known, when thecircuits of a cache are fabricated in an integrated circuit, memorycells of the various ways often are inter-mixed spatially and organizedinto banks to conserve area in the integrated circuit. To conservepower, embodiments of the present invention provide an alternateorganization scheme for cache layouts. FIG. 5 is a block diagramillustrating cache organization according to conventional techniques.Although the dimension of a cache 700 may vary in the number of sets,ways and the size of cache entries, for illustrative purposes, the cacheof FIG. 5 is shown as having 1024 sets, eight ways and cache entrieshaving 256 bits. The cache is also shown organized into eight differentbanks. Address decoders 710, 720 may select one of the sets in responseto an input address signal. In FIG. 5, although two address decoders areshown, only one of the sets 0–1023 will be selected in response to anaddress signal. Addressed data may propagate from the banks to a set oflatches 730–760. From the latches 730–760, the data may be read out fromthe cache 700.

As is known, within a set, memory cells from each of the ways 0–7 may beco-located. Thus, memory cells associated with the most significant bitfrom each of the eight ways may be provided in a cluster (shown as B₂₅₅)in the integrated circuit layout. Memory cells associated with a nextmost significant bit may be provided in a second cluster B₂₅₄. Thisarchitecture may repeat for each bit position concluding with a clusterof memory cells for the least significant bit position B₀ in the cache.The architecture may span across multiple banks in the cache 700. Thus,for set 1023, clusters B₂₅₅–B₀ are shown extending across banks 0, 2, 4and 6. Every bank 0–7, however, stores data from each of the ways. Asnoted, this architecture conserves area in an integrated circuit.

In a conventional cache, when operating upon a target way in the cache,an address decoder must drive a selection signal with sufficient energyto power not only the one target way (say, way 0), but also all otherways in the cache 700. The memory cells are provided in clustersB₂₅₅–B₀. No prior technique permitted an address decoder to select onlya portion of a cluster B₂₅₅.

FIG. 6 is a block diagram illustrating cache organization according toan embodiment of the present invention. For illustrative purposes, thecache 800 is shown as having the same dimension as the cache of FIG.5—1024 sets, eight ways and cache entries having 256 bits. In thisembodiment, a cache 800 also may be populated by a number of banks.Sixteen banks are shown in the example of FIG. 6. In this example,however, cache entries may be distributed throughout the banks in amanner consistent with the power conservation techniques describedabove.

According to an embodiment, each bank 0–16 may store data from less thanthe total number of ways in the cache. For example, as shown in FIG. 6,data from only two ways 0–1 are shown as stored in banks 0, 4, 8 and 12.Each bank may include clusters of memory cells from respective sets ofeach of the ways associated with the respective bank. Thus, bank 12 isshown as having clusters B′₂₅₅–B′₀ each of which would include twomemory cells, one from a respective bit position in way 0 and anotherfrom a respective bit position in way 1. Bank 13 may have a similarstructure. Each cluster B″₂₅₅–B″₀ may include two memory cells, one froma respective bit position in way 2 and another from a respective bitposition in way 3. The structure may be repeated throughout theremaining banks in the cache.

Each bank also may be driven by a respective bank select signal. Whendisabled, it prevents the respective bank from responding to addresssignals from the address decoders. The clusters B′₂₅₅–B′0 and B″₂₅₅–B″₀each are one fourth the size of corresponding clusters B₂₅₅–B₀ in FIG.5. Thus, the architecture of the present invention permits the addressdecoders to generate set selection signals with less power when comparedwith, for example, the cache of FIG. 5.

As may be appreciated, the structure of FIG. 6 conserves power but atthe cost of additional area on the integrated circuit. In FIG. 5, whendata is read from an addressed bank, only 256 data lines are required toread data from the eight ways. By contrast, in FIG. 6, the number ofdata lines associated with the eight ways increase by a factor of 4.Further, additional control logic, such as the bank select signals, maybe provided to implement the control functions described herein.

The principles of the present invention, of course, are not limited tothe example shown in FIG. 6. Instead of having two ways provided in eachbank (e.g. ways 0–1 for bank 12), it is permissible to have a single wayprovided in each bank or, alternatively, any number of ways provided perbank up to one-half the total number of ways in the cache. Eachimplementation achieves a different amount of power conservation. Inpractice, however, power conservation is but one design objective whichwill be balanced by competing objectives to conserve circuit area and tomaintain simplicity in design.

Several embodiments of the present invention are specificallyillustrated and described herein. However, it will be appreciated thatmodifications and variations of the present invention are covered by theabove teachings and within the purview of the appended claims withoutdeparting from the spirit and intended scope of the invention.

1. A cache, having a number of sets and ways, the cache comprising aplurality of addressable banks, each bank populated by clusters ofmemory cells, wherein the clusters within one of the banks areassociated with fewer than the total number of ways in the cache.
 2. Thecache of claim 1, wherein the clusters within each bank are associatedwith only one way.
 3. The cache of claim 1, wherein the clusters withineach bank are associated with two ways.
 4. The cache of claim 1, whereinthe clusters within each bank are associated with half the total numberof ways in the cache.
 5. The cache of claim 1, wherein the banks have anumber of clusters equal to number of bits stored by a cache line of thecache.
 6. A cache, comprising: a plurality of cache entries organizedinto sets and ways, each cache entry comprising a tag field, a datafield and a state field, each of the fields coupled to a respectiveclock line, a cache manager, and a plurality of transmission gates, oneprovided on each clock line, each connected to the cache manager and thecache entry.
 7. The cache of claim 6, further comprising: a victimallocation unit coupled to another clock line, another transmission gateprovided on the other clock line and connected to the cache manager andthe victim allocation unit.
 8. The cache of claim 7, wherein the victimallocation unit further comprises a plurality of eviction controlregisters, each of the eviction control registers to store data to beevicted from one of the ways before new data is stored.
 9. The cache ofclaim 8, wherein the eviction control registers store data to be evictedon a least recently used basis.
 10. The cache of claim 8, wherein theeviction control registers store data to be evicted on a round-robinbasis.
 11. The cache of claim 6, wherein the cache manager controls thetransmission gates provided on the clock lines to selectively enable ordisable the clock lines.
 12. The cache of claim 6, wherein the cachemanager is a state machine.
 13. The cache of claim 6, furthercomprising: an address decoder having a plurality of outputs, eachdecoder output connected to one of the cache entries, the decoder outputto enable the cache entries within all of the ways and to store data inthe data fields of the enabled cache entries; and a comparator togenerate a comparator output if data stored in the tag field matches apredetermined portion of an address input, the comparator output toenable data read from the data field to propagate from the way if thereis a match.
 14. A computer system, comprising: a system clock; and aprocessor cache operating in synchronism with the system clock, theprocessor cache comprising: a plurality of cache entries organized intosets and ways, each cache entry comprising a rag field, a data field anda state field, each of the fields coupled to a respective clock line,the respective clock line is derived from the system clock, a cachemanager, and a plurality of transmission gates, one provided on each ofthe clock lines and connected to the cache manager and the cache entry.15. The system of claim 14, wherein the processor cache furthercomprises: a victim allocation unit coupled to another clock line,another transmission gate provided on the other clock line and connectedto the cache manager and the victim allocation unit.
 16. The system ofclaim 15, wherein the victim allocation unit further comprises aplurality of eviction control registers, each of the eviction controlregisters to store data to be evicted from one of the ways before newdata is stored.
 17. The system of claim 16, wherein the eviction controlregisters store data to be evicted on a least recently used basis. 18.The system of claim 16, wherein the eviction control registers storedata to be evicted on a round-robin basis.
 19. The system of claim 14,wherein the cache manager controls the transmission gates provided onthe clock lines to selectively enable or disable the clock lines. 20.The system of claim 14, further comprising: an address decoder having aplurality of outputs, each decoder output connected to one of the cacheentries, the decoder output to enable the cache entries within all ofthe ways and to store data in the data fields of the enabled cacheentries; and a comparator to generate a comparator output if data storedin the tag field matches a predetermined portion of an address input,the comparator output to enable data read from the data field topropagate from the way if there is a match.